Part 1: Currency Conversion Analysis

Background

In this problem, we will study fluctuations in currency exchange rate over time.

File USD-JPY.csv download contains the daily exchange rate of USD/JPY from January 2000 through May 31st 2022. We will aggregate the data on a weekly basis, by taking the average rate within each week. The time series of interest is the weekly currency exchange. We will analyze this time series and its first order difference.

Instructions on reading the data

To read the data in R, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the R function read.csv()

fpath <- "USD-JPY.csv"
df <- read.csv(fpath, head = TRUE)

Here we upload the libraries needed the this data analysis:

library(mgcv)
library(lubridate)
library(dplyr)

To prepare the data, run the following code snippet. First, aggregate by week:

df$date <- as.Date(df$Date, format='%Y-%m-%d')
df$week <- floor_date(df$date, "week")


df <- df[, c("week", "jpy")]

We now form the weekly aggrgated time series to use for data exploration! Please note that we will analyze the weekly aggregated data not the original (daily) data.

agg <- aggregate(x = df$jpy, by = list(df$week), FUN = mean)
colnames(agg) <- c("week", "jpy")

jpy.ts <- ts(agg$jpy, start = 2000, freq = 52)

Please use the jpy series to code and answer the following questions.

Question 1a: Exploratory Data Analysis

Before exploring the data, can you infer the data features from what you know about the USD-JPY currency exchange? Next plot the Time Series and ACF plots of the weekly data. Comment on the main features, and identify what (if any) assumptions of stationarity are violated.

Which type of model do you think will fit the data better: the trend or seasonality fitting model? Provide details for your response.

Response: General Insights on the USD-JPY Currency Rate

The time series of a currency rate would generally follow a trend, depending on the trade policies and international relations of the two countries. For example, the currency rate of some countries in the last 100 years has varied significantly compared to others. Recall the currency rate debate around the USD vs Yuan (Chinese currency), for example. The Indian currency vs USD has weakened considerably in the past 100 years and this would clearly follow a downward trend. For the USD-JPY currency exchange, the Japanese economy vs the US economy has followed a downward trend in the long term, but has fluctuated in the recent short term.

ts.plot(jpy.ts, col = "blue", xlab = "", ylab = "USD/JPY", main = "USD/JPY Exchange Rate over Time")
grid()

acf(jpy.ts, lag.max = 52 * 4, xlab = "Lag", ylab = "ACF", main = "USD/JPY ACF Analysis")

Response: General Insights from the Graphical Analysis

From the time series plot, we see that the variance fluctuates significantly within the time window. The trend also has some varies greatly within the specified time period. We can say that the variability depends on time for this time series. From the ACF plot, we can see that the autocorrelation is significant but slowly decreasing for all lag periods.

From the two plots, we can clearly say that the trend is clearly present, but no seasonality is observed. Hence, a trend fitting model would likely be a better fit than a seasonality model for this time series.

Question 1b: Trend Estimation

Fit the following trend estimation models:

Overlay the fitted values on the original time series. Plot the residuals with respect to time for each model. Plot the ACF of the residuals for each model also. Comment on the four models fit and on the appropriateness of the stationarity assumption of the residuals.

# convert X axis to 0-1 scale
points <- 1:length(jpy.ts)
points <- (points - min(points)) / max(points)
# 1. Fit a moving average model
mav.model <- ksmooth(points, jpy.ts, kernel = "box")
mav.fit <- ts(mav.model$y, start = 2000, frequency = 52)
# 2. Fit a parametric quadratic polynomial model
x1 <- points
x2 <- points^2
para.model <- lm(jpy.ts ~ x1 + x2)
para.fit <- ts(fitted(para.model), start = 2000, frequency = 52)
# 3. Fit a local polynomial model
loc.model <- loess(jpy.ts ~ points)
loc.fit <- ts(fitted(loc.model), start = 2000, frequency = 52)
# 4. Fit a splines smoothing model
gam.model <- gam(jpy.ts ~ s(points))
gam.fit <- ts(fitted(gam.model), start = 2000, frequency = 52)
ts.plot(jpy.ts, xlab = "", ylab = "USD/JPY", main = "Trend Estimation Comparison")
grid()
lines(mav.fit, lwd = 2, col = "red")
lines(para.fit, lwd = 2,col = "orange")
lines(loc.fit, lwd = 2,col = "green")
lines(gam.fit, lwd = 2,col = "blue")
legend("bottomleft", legend = c("MAV", "PARA", "LOC", "GAM"),
col = c("red", "orange", "green", "blue"), lwd = 2)

Response: Comparison of the fitted trend models:

Of the four trend estimation methods tested, the splines smoothing model appears to capture the data’s trend most effectively; the other three models fail to adequately capture the noteable drop in the USD/JPY exchange rate in the middle of the time period.

# Residual and Residual ACF plots of the residuals from the fitted models
diff.mav <- jpy.ts - mav.fit
diff.para <- jpy.ts - para.fit
diff.loc <- jpy.ts - loc.fit
diff.gam <- jpy.ts - gam.fit
par(mfrow = c(2, 2))
ts.plot(diff.mav, xlab = "", ylab = "Residuals", main = "Moving Average")
ts.plot(diff.para, xlab = "", ylab = "Residuals", main = "Parametric Quadratic Polynomial")
ts.plot(diff.loc, xlab = "", ylab = "Residuals", main = "Local Polynomial")
ts.plot(diff.gam, xlab = "", ylab = "Residuals", main = "Splines Trend")

par(mfrow = c(2, 2))
acf(diff.mav, lag.max = 52 * 4, xlab = "", ylab = "Residuals", main = "Moving Average")
acf(diff.para, lag.max = 52 * 4, xlab = "", ylab = "Residuals",
main = "Parametric Quadratic Polynomial")
acf(diff.loc, lag.max = 52 * 4, xlab = "", ylab = "Residuals", main = "Local Polynomial")
acf(diff.gam, lag.max = 52 * 4, xlab = "", ylab = "Residuals", main = "Splines Trend")

Response: Appropriateness of the trend model for stationarity

The residuals from the trend models show clear non-stationarity, suggesting that trend removal alone using any of the three models is not sufficient for accounting for non stationary variations in the time series.

The ACFs of the residuals also support this observation of non-stationarity; each chart shows slowly-declining lags which are indicative of trend in the residuals.

Question 1c: Differenced Data Modeling

Now plot the difference time series and its ACF plot. Apply the four trend models in Question 1b to the differenced time series. What can you conclude about the difference data in terms of stationarity? Which model would you recommend to apply (trend removal via fitting trend vs differencing) such that to obtain a stationary process?

Hint: When TS data are differenced, the resulting data set will have an NA in the first data element due to the differencing.

ts.plot(diff(jpy.ts), col = "black", xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced USD/JPY Exchange Rate by Time")
grid()

acf(diff(jpy.ts), lag.max = 52 * 4, xlab = "Lag", ylab = "ACF ", main = "USD/JPY ACF Analysis")

# 1. Fit a moving average model
mav.model <- ksmooth(points[-1], diff(jpy.ts), kernel = "box")
mav.fit <- ts(mav.model$y, start = 2000, frequency = 52)
ts.plot(diff(jpy.ts), xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced Moving Average Analysis")
grid()
lines(mav.fit, lwd = 2, col = "red")

# 2. Fit a parametric quadratic polynomial model
x1 <- points[-1]
x2 <- points[-1] ^ 2
para.model <- lm(diff(jpy.ts) ~ x1 + x2)
para.fit <- ts(fitted(para.model), start = 2000, frequency = 52)
ts.plot(diff(jpy.ts), xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced Parametric Quadratic Polynomial Analysis")
grid()
lines(para.fit, lwd = 2,col = "orange")

# 3. Fit a local polynomial model
loc.model <- loess(diff(jpy.ts) ~ points[-1])
loc.fit <- ts(fitted(loc.model), start = 2000, frequency = 52)
ts.plot(diff(jpy.ts), xlab = "", ylab = "Differenced USD/JPY",
main = "Differenced Local Polynomial Analysis")
grid()
lines(loc.fit, lwd = 2,col = "green")

# 4. Fit a splines smoothing model
gam.model <- gam(diff(jpy.ts) ~ s(points[-1]))
gam.fit <- ts(fitted(gam.model), start = 2000, frequency = 52)
ts.plot(diff(jpy.ts),
xlab = "",
ylab = "Differenced USD/JPY",
main = "Differenced Splines Smoothing Analysis")
grid()
lines(gam.fit, lwd = 2, col = "blue")

# 5. Compare all estimated trends
vals <- c(mav.fit, para.fit, loc.fit, gam.fit)
ylim <- c(min(vals), max(vals))
ts.plot(mav.fit, lwd = 2, col = "black", ylim = ylim,
xlab = "", ylab = "USD/JPY",
main = "Differenced Regression Model Comparison")
grid()
lines(mav.fit, lwd = 2, col = "red")
lines(para.fit, lwd = 2, col = "orange")
lines(loc.fit, lwd = 2, col = "green")
lines(gam.fit, lwd = 2, col = "blue")
legend("bottomright", legend = c("MAV", "PARA", "LOC", "GAM"),
col = c("red", "orange", "green", "blue"), lwd = 2)

Response: Comments about the stationarity of the differenced data:

The time series plots seem to clearly show the appropriateness of fit of the models and the indication of stationarity in the differenced data.

The fitted line showing the moving average trend seems to have the least variability. The parametric quadratic model also has little variability, but not as much as the splines model which has higher deviation in trend, and local polynomial model which has the highest deviation as shown in the combined graph. The moving average trend model, however, has many ‘kinks’ that capture the minor movements that might not be of use in determining the trend.

From this analysis, we can confirm the property of stationarity; hence using the differenced data is a more effective approach for removing the trend such that the time series becomes stationary.

Part 2: Temperature Analysis

Background

In this problem, we will analyze aggregated temperature data.

Data Everest Temp Jan-Mar 2021.csv contains the hourly average temperature at the Mount Everest Base Camp for the months of January to March 2021. Run the following code to prepare the data for analysis:

Instructions on reading the data

To read the data in R, save the file in your working directory (make sure you have changed the directory if different from the R working directory) and read the data using the R function read.csv()

You will perform the analysis and modelling on the Temp data column.

fpath <- "Everest Temp Jan-Mar 2021.csv"
df <- read.csv(fpath, head = TRUE)

Here are the libraries you will need:

library(mgcv)
library(TSA)
library(dynlm)
library(ggplot2)

Run the following code to prepare the data for analysis:

df$timestamp<-ymd_hms(df$timestamp)
temp <- ts(df$temp, freq = 24)

datetime<-ts(df$timestamp)

Question 2a: Exploratory Data Analysis

Plot both the Time Series and ACF plots. Comment on the main features, and identify what (if any) assumptions of stationarity are violated. Additionally, comment if you believe the differenced data is more appropriate for use in fitting the data. Support your response with a graphical analysis.

Hint: Make sure to use the appropriate differenced data.

everest<-ts(df$temp,frequency = 24)
plot(everest,xlab="Time",ylab="Temperature",main="Everest Hourly Temperature")

acf(everest,lag.max=24*6,main="Everest Hourly Temperature ACF")

Response: Comments about the time series and ACF plots of the original time series

The time series plot shows that the data exhibits a fluctuating trend and clear hourly seasonality. The ACF plot exhibits lags which are both slowly decreasing and exhibit a cyclical rising and falling pattern, which confirm the presence of trend and seasonality in the data, respectively.

plot(diff(everest),xlab="Time",ylab="Temperature",main="Everest Hourly Temperature: 1-Differenced")

acf(diff(everest),lag.max=24*6,main="Everest Hourly Temperature ACF: 1-Differenced")

plot(diff(everest,24),xlab="Time",ylab="Temperature",main="Everest Hourly Temperature: 24-Differenced")

acf(diff(everest,24),lag.max=24*6,main="Everest Hourly Temperature ACF: 24-Differenced")

Response: Comments about the time series and ACF plots of the differenced time series

The plot of the 1st-order differenced data shows that trend has been removed. The seasonality effect, however, still seems to be present. For the 1st-order differenced data, the first seasonal lag in the ACF large and decays slowly over multiples of the lag. Clearly, the 1st order differenced data is not appropriate for use in fitting the seasonality of the data.

Since we know that the 1st order difference doesn’t appropriately address seasonality, we can apply a 24 lag difference as provided above. The absence of a cyclical pattern in the ACF plot indicates that seasonality has been removed to a great extent; however, there is still evidence of a trend in the data, given the presence of slowly-decaying lags.

Question 2b: Seasonality Estimation

Separately fit a seasonality harmonic model and the ANOVA seasonality model to the temperature data. Evaluate the quality of each fit with residual analysis. Does one model perform better than the other? Which model would you select to fit the seasonality in the data?

times<-ts(df$timestamp)
Timereq<-times
Timereq2<-times^2
## Estimate seasonality using ANOVA approach
td_lm<- dynlm(everest ~ season(everest))
summary(td_lm)
## 
## Time series regression with "ts" data:
## Start = 1(1), End = 90(24)
## 
## Call:
## dynlm(formula = everest ~ season(everest))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.3218  -2.1758  -0.0539   2.0218  11.2616 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -8.637733   0.419589 -20.586  < 2e-16 ***
## season(everest)2  -0.089922   0.593388  -0.152 0.879564    
## season(everest)3   0.007367   0.593388   0.012 0.990096    
## season(everest)4   1.270233   0.593388   2.141 0.032416 *  
## season(everest)5   3.556589   0.593388   5.994 2.40e-09 ***
## season(everest)6   4.840344   0.593388   8.157 5.79e-16 ***
## season(everest)7   5.439567   0.593388   9.167  < 2e-16 ***
## season(everest)8   5.754467   0.593388   9.698  < 2e-16 ***
## season(everest)9   5.580767   0.593388   9.405  < 2e-16 ***
## season(everest)10  5.180078   0.593388   8.730  < 2e-16 ***
## season(everest)11  4.415444   0.593388   7.441 1.44e-13 ***
## season(everest)12  3.262233   0.593388   5.498 4.31e-08 ***
## season(everest)13  2.256178   0.593388   3.802 0.000147 ***
## season(everest)14  1.569878   0.593388   2.646 0.008214 ** 
## season(everest)15  1.310044   0.593388   2.208 0.027369 *  
## season(everest)16  1.070478   0.593388   1.804 0.071371 .  
## season(everest)17  0.818689   0.593388   1.380 0.167828    
## season(everest)18  0.701811   0.593388   1.183 0.237053    
## season(everest)19  0.522100   0.593388   0.880 0.379033    
## season(everest)20  0.408722   0.593388   0.689 0.491028    
## season(everest)21  0.266567   0.593388   0.449 0.653313    
## season(everest)22  0.158144   0.593388   0.267 0.789872    
## season(everest)23  0.031811   0.593388   0.054 0.957251    
## season(everest)24  0.083378   0.593388   0.141 0.888269    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.981 on 2136 degrees of freedom
## Multiple R-squared:  0.215,  Adjusted R-squared:  0.2065 
## F-statistic: 25.43 on 23 and 2136 DF,  p-value: < 2.2e-16
plot(everest, type = "l")
lines(fitted(td_lm), col = "blue")

## Estimate seasonality using harmonic model
td_lm2 <- dynlm(everest ~ harmonic(everest))
summary(td_lm2)
## 
## Time series regression with "ts" data:
## Start = 1(1), End = 90(24)
## 
## Call:
## dynlm(formula = everest ~ harmonic(everest))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.2278  -2.3106  -0.0262   2.2835  11.9682 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  -6.62044    0.08746  -75.69   <2e-16 ***
## harmonic(everest)cos(2*pi*t) -1.40540    0.12369  -11.36   <2e-16 ***
## harmonic(everest)sin(2*pi*t)  2.22315    0.12369   17.97   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.065 on 2157 degrees of freedom
## Multiple R-squared:  0.1733, Adjusted R-squared:  0.1725 
## F-statistic: 226.1 on 2 and 2157 DF,  p-value: < 2.2e-16
plot(everest, type = "l")
lines(fitted(td_lm2), col = "purple")

## Residual Process: ANOVA seasonality model
resid.1 = residuals(td_lm)
## Residual Process: Harmonic seasonality model
resid.2 = residuals(td_lm2)
y.min = min(c(resid.1,resid.2))
y.max = max(c(resid.1,resid.2))
ts.plot(resid.1,lwd=2,ylab="Residual Process",col="blue", ylim=c(y.min,y.max))
lines(resid.2,col="purple")
legend(x=75,y=y.max,legend=c("ANOVA seasonality model","Harmonic seasonality model"),lty = 1, col=c("blue","purple"))

acf(resid.1,lag.max=24*6,main="ANOVA seasonality model")

acf(resid.2,lag.max=24*6,main="Harmonic seasonality model")

Response: Compare Seasonality Models

The regression coefficients for both models are statistically significant, indicating that both models capture a sesonal pattern. The two models perform similarly, except that the ANOVA model overestimates and the harmonics models underestimates seasonality based on the comparison of the fitted values.

The residuals time series plots for both models show a fluctuating trend; this suggests that we will need to jointly fit both trend and seasonality. The ACF plots show also that the residuals are not stationary, with ACF values slowly decreasing, again suggesting the presence of a trend. The ANOVA model seems to capture seasonality better since the ACF values are not maintaining a seasonality pattern as for the harmonics model.

Question 2c: Trend-Seasonality Estimation

Using the time series data, fit the following models to estimate the trend with seasonality fitted using ANOVA:

Overlay the fitted values on the original time series. Plot the residuals with respect to time. Plot the ACF of the residuals. Comment on how the two models fit and on the appropriateness of the stationarity assumption of the residuals.

What form of modelling seems most appropriate and what implications might this have for how one might expect long term temperature data to behave? Provide explicit conclusions based on the data analysis.

time.pts = c(1:length(everest))
time.pts = c(time.pts - min(time.pts))/max(time.pts)
x1 = time.pts
x2 = time.pts^2
#Parametric Polynomial Regression
lm.fit2 = dynlm(everest~x1+x2+season(everest))
summary(lm.fit2)
## 
## Time series regression with "ts" data:
## Start = 1(1), End = 90(24)
## 
## Call:
## dynlm(formula = everest ~ x1 + x2 + season(everest))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.0762  -2.0997  -0.0568   1.8439  10.9177 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        -3.10809    0.42006  -7.399 1.96e-13 ***
## x1                -25.61966    1.02979 -24.879  < 2e-16 ***
## x2                 21.77516    0.99751  21.829  < 2e-16 ***
## season(everest)2   -0.08804    0.51511  -0.171  0.86432    
## season(everest)3    0.01113    0.51511   0.022  0.98276    
## season(everest)4    1.27587    0.51511   2.477  0.01333 *  
## season(everest)5    3.56408    0.51511   6.919 5.99e-12 ***
## season(everest)6    4.84969    0.51511   9.415  < 2e-16 ***
## season(everest)7    5.45075    0.51511  10.582  < 2e-16 ***
## season(everest)8    5.76748    0.51511  11.197  < 2e-16 ***
## season(everest)9    5.59560    0.51511  10.863  < 2e-16 ***
## season(everest)10   5.19673    0.51511  10.088  < 2e-16 ***
## season(everest)11   4.43390    0.51512   8.608  < 2e-16 ***
## season(everest)12   3.28248    0.51512   6.372 2.27e-10 ***
## season(everest)13   2.27821    0.51512   4.423 1.02e-05 ***
## season(everest)14   1.59368    0.51512   3.094  0.00200 ** 
## season(everest)15   1.33562    0.51512   2.593  0.00958 ** 
## season(everest)16   1.09781    0.51512   2.131  0.03319 *  
## season(everest)17   0.84776    0.51512   1.646  0.09996 .  
## season(everest)18   0.73262    0.51512   1.422  0.15510    
## season(everest)19   0.55464    0.51512   1.077  0.28172    
## season(everest)20   0.44298    0.51512   0.860  0.38991    
## season(everest)21   0.30254    0.51512   0.587  0.55705    
## season(everest)22   0.19582    0.51512   0.380  0.70388    
## season(everest)23   0.07117    0.51512   0.138  0.89012    
## season(everest)24   0.12442    0.51512   0.242  0.80916    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.455 on 2134 degrees of freedom
## Multiple R-squared:  0.409,  Adjusted R-squared:  0.402 
## F-statistic: 59.07 on 25 and 2134 DF,  p-value: < 2.2e-16
temp.fit.lm.seas=fitted(lm.fit2)
ggplot(df, aes(timestamp, temp)) + geom_line() + xlab("Time") + ylab("Temperature Data")+
geom_line(aes(timestamp,temp.fit.lm.seas),lwd=1,col="blue")

#Non-parametric model
hr = as.factor(format(df$timestamp,"%H"))
gam.fit.seastr = gam(everest~s(time.pts)+hr)
summary(gam.fit.seastr)
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## everest ~ s(time.pts) + hr
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.640172   0.270126 -31.986  < 2e-16 ***
## hr01        -0.089852   0.381998  -0.235 0.814064    
## hr02         0.007526   0.381998   0.020 0.984282    
## hr03         1.270502   0.381999   3.326 0.000896 ***
## hr04         3.556986   0.381999   9.312  < 2e-16 ***
## hr05         4.840889   0.382000  12.672  < 2e-16 ***
## hr06         5.440278   0.382001  14.242  < 2e-16 ***
## hr07         5.755364   0.382001  15.066  < 2e-16 ***
## hr08         5.581870   0.382002  14.612  < 2e-16 ***
## hr09         5.181406   0.382003  13.564  < 2e-16 ***
## hr10         4.417017   0.382005  11.563  < 2e-16 ***
## hr11         3.264069   0.382006   8.545  < 2e-16 ***
## hr12         2.258297   0.382008   5.912 3.94e-09 ***
## hr13         1.572299   0.382009   4.116 4.00e-05 ***
## hr14         1.312787   0.382011   3.437 0.000601 ***
## hr15         1.073562   0.382013   2.810 0.004995 ** 
## hr16         0.822133   0.382015   2.152 0.031502 *  
## hr17         0.705635   0.382017   1.847 0.064867 .  
## hr18         0.526323   0.382019   1.378 0.168429    
## hr19         0.413364   0.382022   1.082 0.279356    
## hr20         0.271646   0.382024   0.711 0.477119    
## hr21         0.163680   0.382027   0.428 0.668365    
## hr22         0.037823   0.382030   0.099 0.921142    
## hr23         0.089886   0.382032   0.235 0.814012    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##               edf Ref.df     F p-value    
## s(time.pts) 8.946  8.999 335.5  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.671   Deviance explained = 67.6%
## GCV = 6.6682  Scale est. = 6.5665    n = 2160
fit.gam.seastr = fitted(gam.fit.seastr)
ggplot(df, aes(timestamp, temp)) + geom_line() + xlab("Time") + ylab("Temperature Data")+
geom_line(aes(timestamp,fit.gam.seastr),col="purple")

resid.fit.lm = ts(resid(lm.fit2),frequency=24)
resid.fit.gam.seas = ts(resid(gam.fit.seastr),frequency=24)
y.min = min(c(resid.fit.lm,resid.fit.gam.seas))
y.max = max(c(resid.fit.lm,resid.fit.gam.seas))
ts.plot(resid.fit.lm,lwd=2,col="blue",ylim=c(y.min,y.max))
lines(resid.fit.gam.seas,col="purple")
13
## [1] 13
legend(x=75,y=y.max,legend=c("Parametric Polynomial Model","Non-Parametric Model"),lty = 1, col=c("blue","purple"))

acf(resid.fit.lm,lag.max=24*6,main="Parametric Polynomial Model")

acf(resid.fit.gam.seas,lag.max=24*6,main="Non-Parametric Model")

Response: Model Comparison

From the fitted models, we see that the parametric polynomial regression shows a linear trend fitting on the original data, while the seasonality is quite effectively captured. In case of the non-parametric model, while the seasonality is effectively captured, the trend is also fitted much better than with the polynomial model.

From residual analysis of the two models, we see that the residuals of the parametric polynomial regression models show somewhere larger variability. From the ACF of the residuals, we see that the residuals from the non-parametric model fit are stationary treats whereas those from the parametric model show some serial correlation.

We can clearly see here that the non-parametric model of the trend seems to work for the temperature data. We can predict that the temperature will follow a fluctuating trend, with seasonality on a daily basis; hence, this model can be used towards predicting temperature.

Overall, we can see that daily temperature rises and falls over time (as expected); hence, a seasonality model alone is not sufficient to capture the variability in the data.